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I investigate practical consequences of a radical idea built into the foundations 
of probability theory. The idea is that of embedding a stochastic system in an en- 
semble of systems which all start in the same state but develop along different 
trajectories. To understand how this idea was absorbed into the theory, the origi- 
nal motivation for developing the concept of probability and expectation values is 
reviewed in Sec. [TJ Section [5] describes the St. Petersburg paradox, the first well 
documented example of a situation where the use of ensembles leads to absurd 
conclusions. Daniel Bernoulli's 1738 response to the paradox is presented in Sec. [31 
followed by a reminder of the more recent concept of ergodicity in Sec. |4l which 
leads to an alternative resolution in Sec.[S]with the key Theorem l5.2l Section [S] ex- 
plains the intriguing relation of this mathematically similar but conceptually wholly 
different resolution to Bernoulli ' s work and resolves difficulties with unbounded util- 
ity functions noted byE eneerl (jl934fl . Section [7] concludes that the prominence in 



economics of Bernoulli's early arguments may have contributed to poor risk assess- 
ments of modern financial products, with consequences for market stability through 
the effect of credit and leverage, as foreseen by writers as early as Adam Smith. 

1. Origins of probability theory 

Formal and concrete concepts of likel ihood were fir st developed in the context of 
gambling - notable are the works by Paciolil ( 14941) . by Cardano in the mid-16 th 
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century ( Orel 1953), and the w ork by Pascal and Fermat in the summer of 1654. A 



prominent question treated by IPaciolil £1494) as well as Pascal and Fermat (1654) 



is the following "problem of the points imagine that a game of dice has to be 
abandoned before it can be concluded. For instance, players may be betting money 
on the highest score in rolling a dice three times but have to leave after two throws. 
In this situation, how is the "pot" , the total wager, to be distributed among the 
players in a fair manner? 

The first observation is that this is a moral question. Mathematics may aid in 
answering it, but cannot resolve it without appealing to external information, as 
any answer must depend on the concept of fairness. It could be perceived as fair 
that the player with the most points is entitled to the total wager. Another concept 
of fairness would be to call the game inconclusive and return to each player his 
or her individual wager, or the pot may be split equally between the participants. 
Apparently, at least until the 1 7 th cent u ry, th ere was no universal agreement on 
the relevant concept of fairness. Paciolil ( 14941) . for instance, argued that the fair 



solution is to divide the pot in prop ortion to the p oints that each player has accrued 



when the game is interrupted, see ([Devlinl . 120081) . p. 15 



A century and a half later Pascal was approached by Chevalier de Mere to 
produce a conclusive argument based on mathematics that would settle the issue. 
Pascal and Fermat corresponded on the subject and agreed that the fair solution is 
to give to each player the expectation value of his winnings. The expectation value 
they computed is an ensemble average, where all possible outcomes of the game are 
enumerated, and the product of winnings and probabilities associated with each 
outcome for each player are added up. This procedure uses the then revolutionary 
idea of parallel universes. Instead of considering only the state of the universe as it 
is, or will be, an infinity of additional equally likely universes is imagined. Any of 
these additional universes, for all we know, could be reality (i.e. the world as it will 
be). The proportion of those universes where some event occurs is the probability 
of that event. We will see that in the 18 th century Bernoulli noticed undesired 
properties of the ensemble average, and in the 19 th century Boltzmann began to 
specify conditions for its applicability. 

The 1654 investigation, which is generally considered the beginning of proba- 
bility theory, was concerned with a specific problem. It did not attempt to make 
any predictions, for instance involving repetitions of the game, but solely gave 
quantitative guidelines where individuals had incompatible moral intuitions. Moral 
considerations were certainly at the heart of the early debate. Pascal famously used 
expectation values to argue in favor of his religious beliefs, and much of Cardano's 
work on gambling is concerned with morals. He came very close to defining a fair 
game as one where no player has an expected advantage over others: "To the extent 
to which you depart from that equali ty, if it is i n your opponent's favor, you are a 
fool, if in your own, you are unjust" ( Orel . 1953 ) p. 189. 



Following Pascal's and Fermat's work, however, it did not take lo ng for others t o 



recognize the potential of their investigation for making pre dictions . 



writing in these pages 318 years ago, built on earlier work bv lGrauntl|l662l) and de 



Hallcv (1693) 



vised a method for pricing life annuities. The idea of embedding reality in infinitely 
many possible alternatives was revolutionary in 1654, it was essential in the devel- 

t See toevlinll2008l l for a detailed historical account. 
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opmen t of sta tistical mechanics in the 19 th century ( Ehrenfest and Ehrenfest . 1912t 
CohenUl996h . and it co ntinues to be a fruitful mea ns of conceptualizing complex 



and stochastic systems ( Gell-Mann and LlovdL 2004 ). Nonetheless the idea itself is 
a dubious philosophical construct, justified empirically by the success that, under 
appropriate conditions, comes with allowing the use of mathematical rigor. Histori- 
cally, it seems that the philosophical weakness was initially ignored in applications. 
In Sec. Bl we will re view an alternative conceptualization of randomness. 

iHuveens (1657) is credited with making the concept of expectation values ex- 
plicit and with first proposing an axiomatic form of probability theory. This was 
helpful in developing the field mathematically, as results could now be proven to 
be correct. On the other hand, by introducing an axiomatic system, correctness 
becomes restricted to the context of the axioms themselves. A proven result in 
pr obability theory fol lows from the axioms of probability theory, now usually those 
of Kolmogorov ( 19331 ). It is related to reality on l y inso far as the relevant real con- 
ditions are reflected by the axioms. Kolmogorovl (1933) wrote "The theory of prob- 
ability [..] should be developed from axioms in exactly the same way as Geometry 
and Algebra. This means that after we have defined the elements to be studied 
and their basic relations, and have stated the axioms by which these relations are 
to be governed, all further exposition must be based exclusively on these axioms, 
independent of the usual concrete meaning of these elements and their relations." 
He wrote that it would be a different "aim [..] to tie up as closely as possible the 
mathematical theory with the empirical development of the theory of probability." 

To summarize: the first systematic investigation into stochastic systems was 
concerned with moral advice. The second established an axiomatic system. 



2. The lottery 

The St. Petersburg paradox was first put forward by Nicolaus Bernoulli in 1713 
( de Montmort . 17131) p. 402. He considered lotteries of the following type: 
A fair coin is tossed. 

1) On heads, the lottery pays $1, and the game ends. On tails, the coin is tossed 
again. 

2) On heads, the lottery pays $2, and the game ends. On tails, the coin is tossed 
again. 

n) On heads, the lottery pays $2™ _1 , and the game ends. On tails, the coin is 
tossed again. 

In other words, the random number of coin tosses, n, follows a geometric dis- 
tribution with parameter 1/2, and the payouts increase exponentially with n. We 
may call n a "waiting time" , although in this study it is assumed that the lottery 
is performed instantaneously, i.e. a geometric random variable is drawn and no 
significant physical time elapses. The expected payout from this game is 



,2 2 

n=l v ' x 

which is a diverging sum. A rational person, N. Bernoulli argued, should therefore 
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be willing to pay any price for a ticket in this lottery. In reality, however, people 
are rarely willing to pay more than $10, which constitutes the paradox. 
Reactions to the paradox include the following: 

Even though the expected payout is infinite, there is not an infi nite amount o f 
money or goods in the world to pay up. So the lottery is not realistic ( Cramerl . ll728l) . 
If the payouts are limited to some realistic value, then the lottery's expected payout 
is drastically reduced. For example the 31 st term in the sum (Eq. 12. ip comes from 
a payout of about $10 9 , so limiting payouts to $10 9 reduces the expected payout 
from $oo to $15. Similarly, one could argue t hat it is only t oo sensible to ignore 
events with a probability of the order of 10" 9 (iMengerl \l9Mi . 

Another argument is that no one would offer such a lottery because it carries an 
infini te expected loss for the lottery-seller, which makes it irrelevant (jSamuelson . 
19601) . 



3. Bernoulli's resolution 



The quantity calculated in (Eq. 12.11) is usually called an "expected" payout. But 
since it fail s to capture the reality of the situation its conceptual validity must be 
questioned. iBernoullil (Il738l) noted 

"§1 Ever since mathematicians first began to study the measurement of risk there 
has been general agreement on the following proposition: Expected values are com- 
puted by multiplying each possible gain by the number of possible cases where, 
in this theory, the consideration of cases which are all of the same probability is 
insisted up on." 

Indeed, Irluvgensl (|l657l) had postulated: "if any one should put 3 shillings in 
one hand without telling me which, and 7 in the other, and give me choice of either 
of them; I say, it is the same thing as if he should give me 5 shillings..." This 
concept of expectation is agnostic regarding fluctuations, which is harmless only if 
the consequences of the fluctuations, such as associated risks, are negligible. This is 
usually the case in s mall-stakes rec reational g ambling as consi dered in the earliest 
studie s of chance bv IPaciohl ( 1494 ). Cardano ( Orel 1953 ). and Fermat and Pascall 
(1654), mentioned in Sec. [TJ but it is not the case in the St. Petersburg paradox. 
Noticing that the abil ity to bear risk d epends not only on the risk but also on the 
riskbearer's resources, Bernoulli ( 17381) wrote under §3: 

"If I am not wrong then it seems clear that all men cannot use the same rule to 
evaluate the gamble. The rule establi shed in SI mus t, therefore, be discarded." 

Bernoulli, and shortly before him Cramer ( 17281 ). drew attention to psycholog- 
ical and behavioral issues involved in the evaluation of the proposed lottery. The 
desirability or "utility" associated with a financial gain, they argued, depends not 
only on the gain itself but also on the wealth of the person who is making this 
gain. Instead of computing the expectation value of the monetary winnings, they 
proposed to compute instead the expectation value of the gain in utility. To this 
end the utility function u(w) was introduced, which specifies the utility of a wealth 
of $w. 

Since an extra dollar is generally worth less to a rich person than to a poor 
person, u(w) is assumed to be concave, such that du }^ is monotonically decreasing. 
While exceptional circumstances can render this assumption invalid (Bernoulli cites 
an imprisoned rich man who only needs another 2,000 ducats to buy his freedom), it 
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i s well confirmed behaviorally. Otherwise u(w) is only loosely co nstrained . Bernoulli 



(Il738l) suggested the logarithmic function ub(w) = \n(w), while ICramerl (|1728l ) had 



proposed using uc{w) = ^/w instead. Bernoulli's proposition of the logarithm was 
based on the intuition that the increase in wealth should correspond to an increase 
in utility that is inversely proportional to the wealth a person already has, jjz = rz, 
wh ose solution is th e logarithm. 

iBernoullil (Il738l) thus "discarded the rule" (for calculat ing expected gains in 



wealth) by replacing the object whose expectation value was to be calculated. In- 
stead of gains in wealth, he decided to focus on the expectation of gains in some 
function of wealth. 

In Sec. [5] we will also discard the rule established in §1 of ( Bernoulli . 17381) . 



but not by replacing the object whose average is to be calculated, i.e. not by 
replacing plain monetary gains by a function of those gains. Instead we will replace 
the type of average, using the time average instead of the ensemble average. This 
is necessary because the system under investigation (the dynamics of monetary 
wealth) is not ergodic, as will be shown in Sec. [5j In doing so we will critique the 
implicit considering of multiple imagined systems, or parallel universes. 

But first, applying Bernoulli's reasoning, we compute the expected change in 
logarithmic utility, (Aub), due to playing the lottery, given the initial wealth $w 
and the cost of a ticket in the lottery $c, 



/Utility after the game 

, — ■ — * > 

ln(w-c + 2"- 1 ) - ln(u;) j . (3.1) 

V Utility before the game J 



This sum converges (as long as each individual term is finite), as is readily shown 
using the ratio test. Depending on w and c, the quantity can be positive or negative, 
reflecting expected gain or loss of utility. Assuming that potential lottery players 
base their decisions not on the expected monetary gain but instead on the expected 
gain in usefulness, and that that usefulness is appropriately represented by Ms, the 
paradox is thus resolved. 

It is dissatisfying that this resolution of the paradox relies on a function u(w) 
that is postulated and, in the framework of Cramer and Bernoulli cannot be derived 
from more fundamental considerations. Disagreements on whether the assumptions 
(the characteristics of diminishing margin al utility of weal th) are realistic are diffi- 



cult to settle. Anticipating this objection. iBernoulli ( 17381 ) - Daniel being perhaps 



less mathematician than scientist - appealed to observations: "Since all our propo- 
sitions harmonize perfectly with experience it would be wrong to neglect them as 
abstractions resting upon precarious hypotheses." 

The responses to the paradox mentioned at the end of Sec. [5] are similarly dis- 
satisfying - they address the relevance of the problem and argue that it would never 
really arise, but they do not resolve it. Since the paradoxical aspect is the behavior 
of real people, however, these arguments are valid, and all means of disposing of 
the paradox could be similar in character. 

While Bernoulli's observations of human risk aversion and even the functional 
form he proposed for modeling these are "correct" in a specific sense elaborated in 
Sec. these behavioral regularities have a physical reason that Bernoulli failed to 
point out. In fact, it appears that he was not aware of this physical reason, which 
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justifies only ub{w) — \n{w). iBernoulli ( 1738 ) did not consider the logarithmic 
form of utility essential and wrote of Cramer's work, which uses uc{w) = \/w: 
"Indeed I have found his theory so similar to mine that it seems miraculous that 
we independently reached such close agreement on this sort of subject." 



4. Ergodicity 

The question of ergodicity in stochastic systems is concerned with a conceptual 
choice in giving meaning to quantitative probabilities. It can be argued that it is 
meaningless to assign a probability to a single event, and that any decision regard- 
ing a single event must resort to intuition or morals. For mathematical guida n ce the 
event has to be embedded within other similar events. iFermat and Pascal ( 1654 ) 



chose to embed within parallel universes, but alternatively - and often more mean- 
ingfully - we can embed within time. The concept of a decision regarding a single 
isolated event, whether probabilistic or not, seems dubious: how do we interpret 
the premise of isolation? Surely the event is part of a history. Does the individual 
making the decision die immediately after the event? In general the consequences 
of the decision will unfold over time. 



The origins of ergodic theory lie in the mechanics of gases (jUffinkl . 120041) . One is 
interested in large-scale effects of the molecular dynamics, i.e. in the thermodynamic 
variables. For instance, the macroscopic pressure of a gas is a rate per area of 
molecular momentum transfer to a container wall, averaged over an area that is 
large compared to the typical distance between molecules and over a time that is 
long compared to the typical interval between molecular impacts in the area. 

Since the number of particles is large and collisions are possible, however, it is 
practically not possible to explicitly solve the microscopic equations of motion. Full 
information about the state x (positions and momenta of all molecules) is not avail- 
able, and the time average, f or instance of mom entu m transfer to a container wall, 
cannot be computed directly. iBolt zman 3 (Il87ll) and lMaxwelll ll 18791) independently 



replaced the physically required time average by the average over an ensemble of 
appropriately weighted states x, making use of Huygens' expectation value. The 
weight of the different states x in the ensemble was postulated and subsequently 
justified empirically by comparing predictions to observations. 

The key rationale behind this dramatic step is that the systems considered 
are in equilibrium: the macroscopic variables of interest do not change in time , 
and microscopic fluctuations obey detailed balance, see e.g. ( van Kampenl . 1992 ). 



Under these strict conditions, time has little tangible effect, and we may get away 
with disregarding it completely. Nonetheless, both Boltzmann and Maxwell were 
concerned that for mathematical convenience they were using the a priori irrelevant 
ensemble average. 



Specifically, when iBoltzmannl (|1871l) suggested to treat a gas as a collection 
of many systems, namely sub- volumes which can be thought of as a probabilistic 
ensemble, he warned that using this "trick" means "to assume that between [...] 
the various [...] systems no interaction ever occurs ." The requirement of absolutely 
no interaction between a collection of systems is equivalent, in practical terms, to 
the non-existence of all these systems from each other's perspectives - if systems A 
and B cannot ever interact in any way, then to system A, for all practical purposes, 
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system B does not exist, and vice versa. Another way of putting this is that systems 
A and B are parallel universes. 

Assuming the validity of this procedure is known as the ergodic hypothesis. It is 



permi ssible under strict conditions of stationarity, see e.g. IGrimmet and Stirzaker 



(2001), Ch. 9.5. These condit ions were under sto od long after t he St. Petersburg 



paradox had been introduced ( Birkhofl . 1931al fb1; von Neumann . 1932al lb). 



Much of the literature on ergodic systems is concerned with deterministic dy- 
namics, but the basic question whether time averages may be replaced by ensemble 
averages is equally applicable to stochastic systems, such as Langevin equations or 
lotteries. The essence of ergodicity is the question whether the system when ob- 
served for a sufficiently long time t samples all states in its sample space in such a 
way that the relative frequencies /(x, t)dx with which they are observed approach 
a unique (independent of initial conditions) probability, P(x)<ix, 

lim /(x,i) =P(x). (4.1) 

t— ¥QO 

If this distribution does not exist or is not unique, the time average, A = lim/r-^oo ^ Jn A(x, t)dt, 

of an observable A cannot be computed as an ensemble average in Huygens' sense, 

(A) = f A(k, i)P(x)<ix. The generic variable A may depend on time only through 

its state dependence, or it may have explicit time dependence. If -P(x) is not unique, 

then the time average of A generally depends on initial conditions. If -P(x) does 

not exist, there may still be a unique time average. A unique ensemble average may 

also still exist - although we cannot find P(x) from (Eq. 14. ip . we may be able to 

determine P(x, t), the proportion of systems in an ensemble that are in state x 

at time t, and compute the ensemble average as (A) (t) = j x A(x,t)P(x.,t)dx. In 

special cases the time dependencies of A(x, t) and P(x, t) can be such that (A) (t) 

does not actually depend on time. However, there is no guarantee in these cases 

that the time average and ensemble average will be identical. 

Growth factors in the St. Petersburg lottery are such a special case. In Sec. [3] 
it will be shown that while the (a priori irrelevant) ensemble-average winnings 
from the lottery diverge, the time-average winnings do not. Mathematically the 
end result is identical to the result obtained by Bernoulli (although see Sec. Eljaj)). 
Conceptually, however, the arbitrary utility (arbitrary in the sense that it depends 
on personal characteristics) , is replaced by an argument based on the physical reality 
of the passing of time and the fact that no communication or transfer of resources 
is possible between the parallel universes introduced by Fermat. 



(a) The economic context 

To repeat, the quantity in fEq. l2.ip is accurately interpreted as follows: imagine 
a world of parallel universes defined such that every chance event splits our current 
universe into an ensemble containing member-universes for every possible outcome 
of the chance event. We further require that the proportion of members of the 
ensemble corresponding to a particular outcome is the probability of that outcome. 
In this case, if we give the same weight to every member-universe, (Eq. 12.11) is the 
ensemble average over all possible future states of the universe (i.e. states after the 
game) . 

Of course, we are not a priori interested in such an average because we cannot 
realize the average payout over all possible states of the universe. Following the 
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arguments of Boltzmann and Maxwell, this quantity is meaningful only in two 
cases. 



1. The physical situation could be close to an ensemble of non-interacting sys- 
tems which eventually share their resources. This would be the case if many 
participants took part in independent rounds of the lottery, with an agreement 
to share their payouts, which would be a problem in portfolio construction, 
and different from Bernoulli's setupQ. 

2. The ensemble average could reflect the time-average performance of a single 
participant in the lottery. Whereas time averages in statistical mechanics are 
often difficult to compute (hence the ergodic hypothesis), the simplicity of 
the St. Petersburg lottery makes it easy to compute them and see how they 
differ from ensemble averages. 



Thus neither case applies to the St. Petersburg lottery, and the ensemble average 
is irrelevant to the decision whether to buy a ticket. 

In general, to realize an average over the ensemble, ensemble members must 
exchange resources, but this is often impossible, so we must be extremely careful 
when interpreting ensemble averages of the type of (Eq. I2.1j) . 



5. Resolution using non-ergodicity 

The resolution of the St. Petersburg paradox presented in this section builds on the 
following alternative conceptualization: 

• Rejection of parallel universes: To the individual who decides whether to pur- 
chase a ticket in the lottery, it is irrelevant how he may fare in a parallel 
universe. Huygens' (or Fermat's) ensemble average is thus not immediately 
relevant to the problem. 

• Acceptance of continuation of time: The individual regularly encounters sit- 
uations similar to the St. Petersburg lottery. What matters to his financial 
well-being is whether he makes decisions under uncertain conditions in such 
a way as to accumulate wealth over time. 

Similarly, in statistical mechanics Boltzmann and Maxwell were interested in mo- 
mentum accumulated over time. Because they considered equilibrium systems, 
where time is largely irrelevant, they hypothesized that time averages could be 
replaced by ensemble averages. However, a person's wealth is not usually in equi- 
librium, nor even stationary: on the time scales of interest, it generally grows or 
shrinks instead of fluctuating a bout a long-ti me average value. Therefore the er- 
godic hypothesis does not apply (Peters, 2010). Consequently, there is no reason to 



believe that the expected (ensemble-average) gain from the lottery coincides with 
the time-average gain. That they arc indeed different will be shown in this section 
by explicitly calculating both. 

f This situation is equivalent to a single person buying tickets for many parallel rounds of the 
lottery. In the limit of an infinitely rich person and a finite ticket price it can be shown that it is 
advisable to invest all funds in such independent lotteries. 
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The accumulation of wealth over time is well characterized by an exponential 
growth rate. To compute this, we consider the factor by which a player's wealth 
changes in one round of the lotterjjjl, 

w-c + m^ 



w 

where, as in (Eg. 13. If) . $w is the player's wealth before the round of the lottery, $c 
is the cost of a lottery ticket, and $rrii is the payout from that round of the lottery. 
To convert this factor into an exponential growth rate g (so that exp(gt) is the 
factor by which wealth changes in t rounds of the lottery), we take the logarithm, 
gi = ln(rj)0] 

(a) Ensemble Average 
Theorem 5.1. The ensemble-average exponential growth rate in the St. Petersburg 
lottery ls {g ) = In (zZAhT "^f~ 

Proof. First, we consider the ensemble-average growth factor, and begin by aver- 
aging over a finite sample of N players, playing the lottery in parallel universes, 
i.e. in general players will experience different sequences of coin tosses, 

1 N 

<=1 

which defines the finite-sample average (-) N - We change the summation in fEq. l5.2p 
to run over the geometrically distributed number of coin tosses in one round, n, 

71=1 

where k n is the frequency with which a given n, i.e. the first tails-event on the 
n th toss, occurs in the sample of N parallel universes, and n™ ax is the highest n 
observed in the sample. Letting N grow, k n /N approaches the probability of n, and 
we obtain a simple number, the ensemble-average growth factor (r) , rather than a 
stochastic variable (r) N 

(r) = lim (r) N = lim }J -£r„ (5.4) 

n=l 

oo 

= ^ P n r n - 
n=l 

J One "round" of the lottery is used here to mean one sequence of coin tosses until a tails- 
event occurs. Throughout, an index i refers to such rounds, whereas n indicates waiting times — 
the number of times a coin is tossed in a given round. 

f The logarithm is taken to facilitate comparison with Bernoulli's analysis. As long as it acts 
on the average, as opposed to being averaged over, it does not change the convergence properties. 
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The logarithm of (r) expresses this as the ensemble-average exponential growth 
rate. Using (Eq. 15. ip and writing the probabilities explicitly, we obtain 



\n=l v 



.'/=!»!>.(-) " w ~ ■ (5-5) 



□ 



Since the ensemble-average payout from one round in the lottery diverges, 
(Eq. I2.1[) . so does this corresponding ensemble-average exponential growth rate 

(Kq.Q 

(6) Time average 

Theorem 5.2. TTie time-average exponential growth rate in the St. Petersburg 
lottery is g = Y^=i (!) m ( w — c + 2" _1 ) — ln(iy). 

Proof. The time average is computed in close analogy to the way the ensemble 
average was computed. After a finite number T of rounds of the game a player's 
wealth reached 

T 

w(T) = w TT^-t- (5-6) 

The T th root of the total fractional change, 

/ t \ 1 ' T 



r T 



U.n) , (5.7) 



which defines the finite-time average fr , is the factor by which wealth has grown on 
average in one round of the lottery over the time span T. We change the product 
to run over n, 

l/T 



r T 



Y[r^\ , (5.8) 



71=1 



where k n is the frequency with which a given n occurred in the sequence of T 
rounds, and n™ x is the highest n observed in the sequence. Letting T grow, k n /T 
approaches the probability of n, and we obtain a simple number, the time-average 
growth factor f , rather than a stochastic variable fx 



r 

T->oo T->oo 

n=l 



lim r T = Hm TT r^" /T (5.9) 

OO 

f A possible objection to this statement is discussed below, starting on p. 1111 
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The logarithm of r expresses this as the time-average exponential growth rate. 
Using (Eq. I5.1[) and writing the probabilities explicitly, we obtain 



(5.10) 




— ln(tu)) 



□ 



The final line of (Eg. 15.10")) is identical to the right-hand side of (Eq. 13. ip . 

Again the quantity can be positive or negative, but instead of the ensemble 
average of the change in utility we have calculated the time-average exponential 
growth rate of a player's wealth without any assumptions about risk preferences 
and personal characteristics. If the player can expect his wealth to grow over time, 
and he has no other constraints, he should play the game; if he expects to lose 
money over time, he should not play. The loci of the transition between growth 
and decay, where 5 = dehne a line in the c vs. w plane, which is shown in Fig. [T] 
Equation (I5.10[) depends on the player's wealth - keeping c > 1 hxed, g initially 
increases with w, see inset of Fig. [Tjfor the example of c = 2. This is because the 
wealthy player keeps most of his money safe, and a loss does not seriously affect 
his future ability to invest. For the very wealthy player, neither a win nor a loss 
is significant, and the time-average exponential growth rate asymptotes to zero as 
w — > oo. At the other extreme, a player whose wealth w < c — 1 risks bankruptcy, 
which in a sense means the end to his economic life, and lmx u ,_ > ( c _ 1 )+ g = — oo. 

So fEq. l5.10"|) can also be considered a criterion for how much risk a person should 
take. The cost of the ticket, c, is the exposure to the lottery. For fixed (positive) 
w "buying" a ticket is always advisable if c = 0, see Sec. HJaj). As c increases, g 
will eventually become negative as the exposure, or risk, becomes too large. The 
time resolution discourages entering into any gamble where bankruptcy, i.e. zero 
or negative wealth after the game, occurs with non-zero probability. In these cases 
individual terms in the sum in (Eq. I5.10[) are undefined. 

Equation (I5.10|) . may seem an unnatural criterion for the following reason: the 
lottery is played only once with wealth w. By the next round, wealth has changed 
by mi — c, and the situation has to be re-evaluated. So in reality, although we may 
buy tickets at the same price repeatedly, a combination of different g (resulting 
from different "initial" wealths w) will be realized. However, over a sufficiently 
long time, we must assume that we will face equivalent decisions again, and thus 
play equivalent lotteries again. Let's consider the result of playing many rounds in 
different lotteries, JJ? r^. Because of commutativity we can rearrange the factors 
in the product so that the first T" factors correspond to the rounds in which we 
face equivalent lotteries (for instance we have the same wealth, and the tickets 
are offered at the same price), and the remaining T — T' factors refer to different 
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w 

Figure 1. Equation (|5.10|l (or (Eq. 13 . 1 fl ^1 defines a relationship between w and c, 
where g(w,c) = 0, i.e. the player breaks even over time (or his ensemble-aver- 
age logarithmic utility change is zero) if he pays $c given his wealth $w. Inset: 
Time-average exponential growth rate (or ensemble-average logarithmic utility change), 
g(w,c = 2) = (Aub) (w,c — 2), for the St. Petersburg lottery as a function of wealth, 
$w, with a ticket price of $c = $2. If the player risks bankruptcy by purchasing a ticket, 
g{w, c) — > — oo. To the infinitely wealthy player a gain or loss is irrelevant, and g(w, c) — > 0. 

situations, 

T T' T 

Lh = Lh II r » ( 5 - n ) 

i=l j=l m=T'+l 

Whatever [| m _ T , +1 r m may be, the steps in (Eq. l5.9p apply to the first product, and 
the sign of the quantity in (Eq. I5.10[) . which determines whether the first product 
is greater or smaller than one, is a good criterion for deciding whether to buy a 
ticket. 

It is instructive to calculate the time-average exponential growth rate in another 
way. Line 2 of (Eq. 15.101) looks like an ensemble-average exponential growth rate, 
obtained by computing exponential growth rates for individual systems and then 
averaging those over the ensemble, 

1 N 

ft, nT,^). (5.12) 

i=l 

The reason why this quantity is not the ensemble average but the time average is 
subtle. There is no limit T — > oo, so how can this be a time average? 

Averages extract deterministic parameters from stochastic processes. The en- 
semble average does this by considering an infinity of parallel universes, and the 



Article submitted to Royal Society 



Time resolution of St. Petersburg paradox 



13 



time average does it by considering infinitely many time steps. But to find the time 
average it is not necessary for time itself to approach infinity. Instead, the unit of 
time can be rescaled. It is only necessary that all possible scenarios occur exactly 
with the appropriate frequencies during the sampling time. As long as the effect 
of time - of events happening sequentially - is accounted for, this will lead to the 
time average. 

We have used one round of the lottery as one time unit. Thus the return from 
one time unit will be one of the possible returns r n . If we observe only one time unit, 
the best estimate for the time-average return would be the return that happened 
to be realized in that time step. An approximate estimate for the time-average 
exponential growth rate is thus g° st w n — iJJ 

To improve this estimate, we pick q returns rj at random and in agreement with 
the probabilities p n , and let each return act for 1/q time unit^j]. The total time that 
passes during the experiment is kept fixed but we separate it into q sub-intervals 
of time. The result will be 

sr t =E(»-J / "- 1 ). ( 5 - 13 ) 

The proportion of sub-intervals during which return r n is realized will approach p n 
as q — > oo. In this limit we can therefore replace the sum over time steps by a sum 
over n as follows 



-est 



= Km V (r) /q - 1) (5.14) 

q— >oo ^ — ' J 

i=i 

oo 

= V lim k n (rl/« - 1) (5.15) 

n—1 
oo 

= y>„ lim «( r y« - 1), (5.16) 



q— >oo 
n=l 



where k n once again is the frequency with which a given n occurs, now in the sample 



of q sub- intervals. Using the definition of the logarithm, ln(r„) = lim g _ i . 00 q(rl/ q — 1) 



yields 

q oo 



lim Y^(r) lq - 1) - y>„mr„, (5.17) 

j=l n=l 

meaning that the time-average exponential growth rate, derived by splitting a time 
unit into infinitely many sub-intervals and playing through all possible scenarios 
in these sub-intervals can be written as the expectation value of the logarithm of 
returns. A limit which is equivalent to the apparently missing limit T — > oo is 
implied by the logarithm and evaluated before the explicit limit in fEa. 15.12]) . Thus 
(Eq. 15.121) is an ensemble average of a time average, which is nothing but a time 
average. 

f The approximation ln(ri) m r\ — 1 is permissible here because we will consider infinitesimal 
time steps such that exp(gdt) = 1 + gdt will be exact. 

| The subscript j is used here instead of i because it refers not to one round but to sub-intervals 
of one round. 
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6. Relation to Bernoulli's resolution 

Equation (|5.10p is mathematically equivalent to Bernoulli's use of logarithmic util- 
ity. Bernoulli argued behaviorally that instead of the expectation value of monetary 
gain, the expectation value of the gain in a loosely constrained function (the util- 
ity) of wealth should be considered. One of the allowed functions is the logarithm, 
which has the special property of encoding the multiplicative nature common to 
gambling and investing in a linear additive object, the expectation value 



y^p w lnr n = In lim ■ (6.1) 




Inadvertently, by postulating logarithmic utility (left-hand side of fEq. l6.ip ). Bernoulli 
replaced the ensemble-average winnings, with the time-average exponential growth 
rate in a multiplicative non-ergodic stochastic process (right-hand side of (Eq.[OJ)- 
Bernoulli did not make the time argument, as is evident from his acceptance 
of Cramer's square-root utility, which does not have the property of (Eq. 16. 1[) : 
^2j° p{ri)iJri cannot be written as a similar product. This is problematic because 
the arbitrariness of utility can be abused to justify reckless behavior, and it ignores 
the fundamental physical limits, given by time irreversibility, to what can be con- 
sidered reasonable. But because Bernoulli's early work postulated the logarithm, 
many consequences of fEa. l5.10f have already been discussed in the literature under 
the heading of "logarithmic utility" . 

A differen t headi ng for these results is "Kelly criterion" ( Kelly Jr. , 1956t Cover and Thomas 



19911 : iThord . 120061 ). In contrast to ensemble-average exponential growth rates, 



which often diverge (for instance with leve r age) time-averag e exponential growth 
rates can be optimized (jPetersl 20091 2010h . Kelly Jr. (1956) used this fact to op- 



timize wager sizes in a hypothetical horse race using private information. While 
he refrained from using utilities because he deemed them "too general to shed any 
light on the specific problems" he considered, he did not point out the fundamental 
difference in perspective his treatment implies: in essence, arbitrary utility func- 
tions are replaced by the physical truth that time cannot be reversed. My aim here 
is to emphasize this difference in perspective. It is crucial that logarithmic utility 
from this point of view is not a utility at all. Rather, the logarithm accounts for 
the multiplicative nature of the process: the ensemble average of the logarithm of 
growth factors equals the logarithm of the time average of growth factors. 

Comparing (Eg. 15.101 and (Eg. 13. ip . it is tempting to say that the time average 
justifies logarithmic utility. I advise against this interpretation because it conflates 
physical concepts of time with behavioral concepts of usefulness. Any utility func- 
tion other than the logarithm leads to recommendations that do not agree with the 
time perspective. 

(a) Menger's objection to unbounded utility 
Bernoulli ( 17381 ) did not actually write down (Eq. I3.ip . although it is often 



assumed that that was his intention. Instead of using the criterion (Eq. 13. ip he 
argued in two steps "how large a stake an individual should be willing to venture" , 
pp. 26-27. 
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1. The expected gain in utility is calculated without explicitly taking the ticket 
price into account 



OO , . Ti 

E U ln(«; + 2"- 1 )-lnH. 



n=l 



3.2) 



2. This is followed by the statement that "the stake more than which persons 
[...] should not venture" is that ticket price, $c, which satisfies 

expected utility gain with c — 



oo , v n 

E ( o ( ln ( w + 2 " _1 ) - ln H) - t ln H - ln (™ - c )l = °- ( 6 - 3 ) 



n=l 



utility loss at purchase 



This is not the condition that defines the line in the main panel of Fig. [T] as it 
does not imply that the expected gain in utility, (Eq. 13.11) . is zero. In this sense 
(Eq. 13. ip is an inaccurate, although generally accepted and sensible, representation 
of Bernoulli's work. The difference between (Eq. 13. ip and (Eq. 16. 3p and the con- 
flicting interpretations of these equations are consequences of the aforementioned 
ar bitrariness of th e utility framework. 

IMeneerl (Il934l) claimed that using logarithmic utility as Bernoulli did, (Eq. I6.2p 



and (Eq. I6.3p . does not resolve modified versions of the paradox where payouts 
$/(n) as a function of waiting time n increase faster than according to Bernoulli's 
original /s(n) = 2™" 1 . Specifically, Menger considered /m(^) = wexp(2") — w. 
Note that this function is curiously defined in terms of the initial wealth. His first 
step closely mirrors Bernoulli's first step, but then he jumps to an untenable con- 
clusion: 



1. IMeneerl (|1934l ) pointed out that replacing 2™ 1 in (Eq. 16. 2p by /m(^), the 



expected gain in logarithmic utility at zero ticket price diverges. 

2. He argued that the paradox remains unresolved because "it is clear that also 
in the modified St. Petersburg game no normal person will risk a large amount 
or even his fortune as a wager", p. 468, my translation, and generalized to 
conclude that this formally prohibits the use of any unbounded utility func- 
tion. 

The meaning of the second statement is unclear. A player who pays "his fortune" 
for a ticket in Menger's lottery and then experiences a heads-event on the first coin 
toss, i.e. the worst possible outcome, will still gain since the worst-case payout, 
$/m(1) = $u>exp(2) — $w, is more than the initial wealth, $w. For a person to risk 
losing anything at all, the ticket price has to be $c > $/m(1), far greater than the 
person's wealth. For a person to risk losing his entire wealth, the ticket price has 
to be greater still, $c > $/m(1) + $u> = $wexp(2). But at such prices (Eq. I6.3f) is 
undefined. 

Perhaps Menger meant that Bernoulli's condition (Eq. I6.3P cannot be satisfied, 
and the price one should be willing to pay is infinite. In that case the second 
part of Menger's argument implicitly assumes that the positive divergence in the 
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first part cannot be offset by anything else. But as the ticket price approaches the 
player's wealth, c — > w, the utility loss at purchase in (Eq. 16. 3|) represents another 
divergence, implying that this argument is inconclusive. It will now be shown to be 
invalid. 

Menger's objection to Bernoulli could hold in the following sense: the left-hand 
side of (Eq. 16.31) . using Menger's payout function, diverges positively if c < w and is 
undefined otherwise. But it is never zero (or negative) - when does it warn against 
the gamble? To understand what the undefined regime signifies, one has to study 
the process of divergence and compare the two infinities as they unfold. 

For any finite n max , a finite value c < w does exist which renders the corre- 
sponding partial sum zero, 

"max 

Vn max <oo 3 c<w : I - ) 2" - [ln(w) - ln(w - c)] = 0. (6.4) 
n=l ^ ' 

To ensure positivity up to exactly c — w, where the expression becomes unde- 
fined, events of zero probability have to be taken into account. For any non-zero 
lower bound on probability Bernoulli's condition can be satisfied. In this sense, in 
the literal original Bernoulli-setup, values of c > w, where (Eq. I6.4j) is undefined, 
correspond to the recommendation not to buy a ticket, and the paradox is resolved. 

Menger's conclusion is incorrect. Bernoulli's logarithmic utility recommends to 
purchase tickets as long as they cost less than the player's wealth, implying a signif- 
icant minimum gain - a course many a "normal person" may wish to pursue. The 
criterion could be criticized for the opposite reason: it rejects prices that guarantee 
a win, even in the worst case. 

The time resolution produces the criterion in Theorem 15.21 which is equivalent 
to (Eq. 13.11) and not to Bernoulli's literal original criterion (Eq. 16. 3[) . Consequently, 
it yields a different recommendation, which may at first appear surprising but turns 
out also to correspond to reasonable behavior given the assumptions on which it 
is based: it recommends to purchase a ticket at any price that cannot lead to 
bankruptcy. The player could be left with an arbitrarily small positive wealth after 
one round. The recommendation may be followed by a "normal person" because of 
the assumption that equivalent lotteries can be played in sequence as often as de- 
sired. Under these conditions, irrespective of how close a player gets to bankruptcy, 
losses will be recovered over time. Of course, if these conditions are violated, the 
time resolution does not apply. This last statement is another warning against the 
naive use of mathematics, whose truth is always restricted to the context of ax- 
ioms or assumptions. Applicability reflects the degree to which assumptions are 
representative of real conditions in a given situation. While ensemble averages are 
meaningless in the absence of samples (here parallel rounds), time averages are 
meaningless in the absence of time (here sequences of equivalent rounds). 



7. Discussion 



Excessive risk is to be avoided primarily because we cannot go back in time. Be- 
havioral aspects and personal circumstances are relevant on a different level - they 
can change and do not immediately follow from the laws of physics. 

The perspective described here has cons equences far beyon d the St. Peters- 
burg paradox, including investment decisions (Peters, 2009. l2010h as well as macro- 
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economic processes. For example, it is sensible for a nation striving for growth to 
encourage risks that lead to occasional bankruptcies of companies and individuals. 
How large a risk is macroeconomically sensible? What are the moral implications? 
Does gross domestic product - a linear sum, similar to a sample average - measure 
what one should be interested in? While the St. Petersburg lottery is an extreme 
case, (Eg. 15. 10|) and Fig. [1] carry a more general message: if net losses are possible, 
the negative time-average exponential growth rate for small wealth, w, turns pos- 
itive as w increases, implying higher exponential growth rates for larger entities. 
In a collection of such entities inequality has a tendency to increase, letting large 
entities dominate and monopolies arise. This can cause markets to cease function- 
ing, as competition is compromised or corporations become "too big to fail" . There 
is anecdotal evidence that assets in stock markets have become more correlated 
in recent decades, and effective diversification (which mimics ensembles) harder 
to achieve. This would make the time perspective even more important, and the 
consequences of ignoring it more dramatic. 

Utility functions are externally provided to represent risk preferences but are 
unable by construction to recommend appropriate levels of risk. The framework is 
self-referential in that it can only translate a given utility function into actions that 
are optimal with respect to that same utility function. This can have unwanted 
consequences. For example, leverage or credit represents a level of risk which needs 
to be optimized, but current incentive struc tures i n the f inancial in dustry can en - 
courage exceeding the optimal risk. Adam Smith ( 17761) . cited in (iFolevl l2006h . 
warned that excessive lending - in his case based on bills of exchange for goods 
in transit - can lead to a collapse of the credit system, followed by bankruptcies 
and unemployment; insufficient lending, on the other hand, can lead to economic 
stagnation, two stages which often follow one another in a boom-bust cycle. To 
avoid both, good criteria for appropriate levels of risk are needed, which the utility 
framework cannot deliver. The time arguments presented here provide an objective 
null-hypothesis concept of optimality a nd can be used to optimize leverage under 
a given set of conditions In the present case, optimality based on 

such considerations is a good approximation to practically optimal behavior. This 
is evident from Bernoulli's work, whose justification of the almost identical result 
was practical validity. 

The proposed conceptual re-orientation may help reconcile economic theory with 
empirical regularities, such as risk aversion, known from behavioral economics. 

It is easy to construct examples where less risk should be taken than recom- 
mended by the criterion in fEa. 15.10]) . For example, some fraction of $w may already 
be earmarked for other vital use. It is very difficult, however, to think of an exam- 
ple where greater risk is beneficial. For this reason, the time-perspective is a useful 
tool for finding upper limits on risk to be implemented in regulation, for instance 
as margin requirements, minimum capital requirements, or maximum loan-to- value 
ratios. Of course, such regulation must take into account further arguments, but its 
least complex form can be based on the time perspective. 

The epistem ological situation is typical of the process of concept formation 
( Lakatosl . 19761) . As the conceptual context changed, in this case from moral to 
predictive, the original definition of the term "expectation" began to lead to para- 
doxical conclusions. From today's conceptually later perspective it appears that N. 
Bernoulli made a hidden assumption, namely the assumption, explicitly stated by 
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Huvgensl (| 1657T) , that "it is the same thing" to re ceive 5 shilling s as it is to have an 



equal chance of receiving either 3 or 7 shillings. lLakatosI ((1976) points out that it 
can be hard to imagine in retrospect that an eminent mathematician made a hidden 
assumption, which is often perceived as an error. He writes on p. 46, "while they 
[the hidden assumptions] were in your subconscious, they were listed as trivially 
true - the ...[paradox] however made them summersault into your conscious list 
as trivially false." Similarly, it seems trivially true at first that the expected gain 
from playing the lottery should be the criterion for participating - taking time into 
account makes this assumption summersault into being trivially false. 

Thus the St. Petersburg paradox relies for its existence on the assumption that 
the expected gain (or growth factor or exponential growth rate) is the relevant quan- 
tity for an individual deciding whether to take part in the lottery. This assumption 
can be shown to be implausible by carefully analyzing the physical meaning of the 
ensemble average. A quantity that is more directly relevant to the financial well- 
being of an individual is the growth of an investment over time. Utility, which can 
obscure risks, is not necessary to evaluate the situation and resolve the paradox. It 
is the actual wealth, in $, of a player, not the utility, that grows with g, (Eq. l5.10|) . It 
is manifestly not true that the commonly used ensemble-average performance of the 
lottery equals the time-average performance. In this sense the system is not ergodic, 
and statements based on anything other than measures of the actual time-average 
performance must be interpreted carefully. 
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